Search CORE

1 research outputs found

Tests4Py: A Benchmark for System Testing

Author: Eberlein Martin
Grunske Lars
Serce Batuhan
Smytzek Marius
Zeller Andreas
Publication venue
Publication date: 11/07/2023
Field of study

Benchmarks are among the main drivers of progress in software engineering research, especially in software testing and debugging. However, current benchmarks in this field could be better suited for specific research tasks, as they rely on weak system oracles like crash detection, come with few unit tests only, need more elaborative research, or cannot verify the outcome of system tests. Our Tests4Py benchmark addresses these issues. It is derived from the popular BugsInPy benchmark, including 30 bugs from 5 real-world Python applications. Each subject in Tests4Py comes with an oracle to verify the functional correctness of system inputs. Besides, it enables the generation of system tests and unit tests, allowing for qualitative studies by investigating essential aspects of test sets and extensive evaluations. These opportunities make Tests4Py a next-generation benchmark for research in test generation, debugging, and automatic program repair.Comment: 5 pages, 4 figure

arXiv.org e-Print Archive